InTrack project ˗ Theoretical framework, design, and methods: A study protocol

Sports performance is the result of a complex interaction between individual and environmental factors. The purpose of this paper is to explain the methods used in the InTrack Project, a cross-sectional and cross-cultural project developed to investigate the variance in the performance of runners from different countries and to understand whether the differences in the performance can be explained by micro-level (athletes characteristics and proximal environment), meso-level (the distal environment that plays a relevant role on the relationships established at micro-level), and the macro-level (environmental features that shape countries characteristics). The sample will be comprised of runners, of both sexes, from four countries. Data collection will be performed in two steps: i) Individual information and ii) Country-level information. At the individual level, data will be obtained from an online survey. At the country level, characteristics data will be obtained from the secondary data available (demographic, social, and economic variables). Statistical procedures expected to be used include multilevel analysis, latent class analysis, addictive and multiplicative interaction in regression models. This wealth of information is of relevance to fill gaps regarding the existence of variables to connect different levels of information, and to provide scientific support about environmental characteristics important to predict runners’ performance within and between countries.


Introduction
Athletes' performance is a dynamic, non-linear, and multidimensional phenotype [1], characterized as being complex and multi-faceted. These characteristics reinforce the use of holistic approaches to better and deeper understand this phenomenon. From the set of available theories, Bronfenbrenner's ecological systems theory (1977,2011) was previously highlighted. Firstly presented in 1977 [2], the theory was developed as a critique of experimental psychology, and designed to provide a new approach to studying children's development. As most of the studies were unidirectional, adopting the behavior as an outcome of the environment, the theory suggests that children's development must be studied considering the interplay of subject-context [2]. Further, the theory proposes that variables responsible for the expression of a given behavior are derived from different levels/contexts, which are positioned in a hierarchical structure (from the closest to the furthest from the subject), interacting within and between levels.
Moving forward the borders of psychology, the ecological systems theory has been applied in sports science, supporting studies from auxology to sports performance [3,4]. Since athlete's performance is the result of the interaction between variables that come from different levels/ environments, which can be located both close or distant from athletes, the use of ecological system theory to understand the expression of this outcome has expanded [5]. In summary, the different levels include the micro-level-intrapersonal and training characteristics, adding the proximal environment, such as coach-athlete dyad, family and friends support; the mesolevel-where athletes' direct relationships are not observed, but whose environment plays a relevant role on the relationships established at micro-level, such as club sports, sports federations; and the macro-level-environmental features that shape sports systems, including culture, economic, and demographic indicators [6][7][8][9].
The interplay of these different levels accounts for the differences in the expression of athlete's performance intra and inter countries since athletes live under different natural, social, and cultural environments [10][11][12]. In other words, between-countries differences result from the interaction of different domains, including (but not limited to) the political system, economy, education, cultural factors, military systems, and living conditions (e.g., poverty levels, human development indicators) [13,14]. As a result of these differences, it is expected to observe differences in sports performance at an international level, given that inputs (e.g., economic support, sports culture, dissemination, and access) differ between countries. For example, in some African countries, most budgets are perpetually stretched to urges, such as conflict resolution, hunger, and poverty eradication [14], meaning that notwithstanding the role of sports can play in these societies, other needs require more investments. In addition, hidden features, such as the shared attitudes and cultural values within groups can be related to sports representativeness or its absence.
Studies aiming to explain international sports success through countries' characteristics have shown that population size, human development index (HDI), and political systems can explain half of the performance between countries [15]. The HDI-an index that comprises health, income, and formal education access-is considered one of the most important variables that, in association with cultural boundaries, foster an atmosphere of sports development [16]. A positive association was found between states HDI and the likelihood to be a soccer player in a first-division club in the Brazilian context [16,17]. Similarly, athletes from cities with high HDI were six times more likely to become a swimmer compared to those born in cities with lower HDI [18]. Specifically, in running context, the 20 best sprinters ranked worldwide (2006 to 2016) were from countries with high HDI, while those competing in endurance events (10,000 meters or above) came from low/middle HDI countries [6]. These results were related to the training specificities and the African phenomenon, which biased the results of endurance events.
Despite the interest in understanding the complex features of athlete's environment, research focusing on the role of country-related variables are still reduced. Due that sports performance is a "global race" [12,19], and that athletes' success is likely to be developed, studies investigating similarities/differences between countries are of relevance [19]. The Sports Policy factors Leading to International Sporting Success project, developed in 15 countries participants in the summer Olympic Games, showed that financial support is an input to countries' performance (i.e., number of medals), as well as that each country operates in a unique system, considering specific environment features [20]. In addition, the authors highlighted the importance of considering cultural aspects, as well as the study of specific sports disciplines [21].
Running is considered a low-cost practice with easy access that can be performed under a diversity of geographical conditions [22]. The increment in the number of participants in running events [23,24], and the barriers broken over the last years [25] increased the interest in understanding factors related to runners' performance, including, but not limited to genetic parameters [26,27], physiological [28], anthropometric [29], physical fitness [30,31], and training characteristics [32,33]. However, besides this athletes-centered approach, the context to which athletes belong must be considered an important factor related to their performance. For example, the dominance of Kenyan and Ethiopian athletes in long distances disciplines has been debated, which not only considers the exceptional characteristics of the athletes, but also the historical background, possibilities for economic progress, and potential for social rise and better living conditions through the sport [34]. Although these factors work as an important input for running, there is no information about the mechanisms by which macro-level characteristics affect the outputs (athletes' performance). In this sense, cross-level interactions need to be investigated, to deeply understand what are the variables capable to connect different levels.
Taking into account within-and between-countries differences, understanding specific environments which act differently on the development of runners' performance is helpful. This wealth of information is of relevance to fill some gaps regarding the existence of variables to connect different levels of information, that is, characteristics at a superior level that can be linked to environmental features and training opportunities at lower levels; and also to provide scientific support for public policies programs with benefits for both, individual and societal levels. In this way, the main question that guided this project is: Is there performance variance between runners from different countries? If there is, can these differences be explained by micro-, meso-and macro-level variables?

Specific purposes
� To verify runner's profile differences within-and between-countries; � To identify if different clusters derived from economic, and social support, predict within and between countries differences in running performance; � To test cross-level interactions, considering additive and multiplicative effects to predict runners' performance; � To verify the non-linear relationship between country-level variables and runners' performance through the Network models.

Conceptual model
Fig 1 summarizes the conceptual approach of the project. This model was designed to indicate: 1) the hierarchical structure between micro-, meso-, and macro-levels; 2) the integration between different levels; and 3) the relative importance of the variables within and between levels. The core of the figure highlights the main phenotype in which we are interested-running performance. Performance can raise some debates in the scientific context [35]. In the present project, considering the design and logistical boundaries, performance will be understood as a product (output), because as such running performance is developed based on the inputs. The nested structures consider the information/variables from different levels. At the same time, the gradient colors and the pattern model were designed based on an athletics track, which means that the further away from the core (center), the smaller the direct influence of the

PLOS ONE
variables in the explanation of runners' performance variance. In summary, these ideas highlight that different variable, at different levels, are more or less connected, and running performance behavior emerges from this interplay. The cross-sectional line highlights the interplay between levels and can be associated with a "start line" in the athletics track. Therefore, it is an unfinished model, which was not designed to be a theoretical model.

Design and sample
This is a cross-sectional and cross-cultural study, named InTrack project (https://www. intrackproject.com/). The sample will comprise runners, of both sexes, from different countries. The expected sample size is estimated to be at least 80 participants, in each country (sample size estimated based on the Gpower 3. Considering that data will be collected using a web survey, for informed consent, all participants must click on the option that indicates "I have read and agree to participate in the research". Participants will receive a copy of the concordance term in the email. This procedure was approved by the ethical committee.

Eligibility criteria
Athletes' eligibility criteria runners must self-classify as a runner, age �18 years; have taken part in at least one official competition in the last 12 months previous to data collection; and answer the online questionnaire. Athletes that do not answer all the mandatory questions of the questionnaires (i.e., country of residence and running pace) will be excluded during the data analysis process.
Countries' eligibility criteria The countries' inclusion in the project is conditioned by the approval of the local ethics committee

Data collection procedures
The research will occur in a virtual environment, from the participant recruitment to the data collection phases. The dissemination of the research will be carried out through contact with athletics Federations of the countries, higher education institutions, social media, personal social networks of the research team, and through prior contact with sports clubs during the years 2022/2023. After showing interest in participating in the research, a link with the questionnaire will be sent and the consent form. Only after agreeing to participate in the research, by signing the consent form, the participant will have access to the questionnaire to answer. The participant is allowed to abort the process of answering the questionnaire or abstain from answering the questions. When completing the questionnaire, the participant will receive a message indicating that the questionnaire has been completed. Within 12 months of the data collection, the personal information gathered for this study will be pseudonymized, and within/after 18 months, it will be completely anonymized. True anonymization renders information non-personal.

Microsystem-level
Athletes' information will be obtained through a web survey, shared through the Google Forms platform. This strategy has been largely used in different research fields [36,37], and cross-cultural research showed equivalence between paper and web-survey modes of administration [38]. The questionnaire was developed by the authors based on the theoretical framework of the project [2,39] and previous instruments [40,41].
The questionnaire comprises five domains and 37 items, and participants will spend approximately 15 minutes completing it. The questionnaire provides information about runner identification (age; sex); anthropometric variables (body height; body mass); sociodemographic profile (country of current living; monthly income; educational level; marital status); training characteristics (volume, duration, and frequency/week; sessions/day; practice time; running pace); involvement in an official running event; motivation for the practice; relationship with coaches; and perception about contextual support (family, friends, coach, training facilities). Mandatory questions include information about the country of residence and running pace (outcome variable).
The questionnaire's psychometric quality will be tested through evaluation by experts and also by a pilot test among Brazilian runners. Items and domains will be assessed considering "content", "objectivity", "clarity", "readability", and "understanding of the content" [42]. Following, for its use in non-Portuguese speakers' countries, a translation, followed by a back translation, will be performed. A cultural equivalence is possible to be performed.

Mesosystem information
Mesosystem information will be obtained from free access web pages and documents for each country. The information about the existence of programs for the selection and development of sports talent, the number of high-performance athletes in the ranking of the sports disciplines at national and international (world) levels, the number of sports clubs, the number of high-performance competitions and sports investment in the modality will be gathered. Additional information can be obtained.

Macro-level information
Demographic, economic, and sports financial support data will be obtained from free access web pages and official documents with open and unrestricted access from each country (National Institute of Statistics of the countries; National datasets). The following information will be gathered: a) population size and density; b) HDI; c) gross domestic product; d) per capita income; e) annual sports investment; f) the number of high-performance athletes in the sport's ranking at international (world) level; g) cultural aspects. Cultural information refers to the six dimensions for Hofstede [43]-power distance, individualism, masculinity, uncertainty avoidance, long-term orientation, and indulgence.

Expected statistical procedures
Descriptive information will be presented in the five-number summary in statistics (minimum value, first and third quartile, the median, and maximum values), mean (standard deviation), and frequencies (%). Given the nested structure of the data, for the main question of the InTrack project, the multilevel analysis will be computed. Multilevel analysis, also named hierarchical linear models, linear mixed-effect models, mixed models, or nested data models [44], considers the data organization at different levels. The first important information includes the intraclass correlation coefficient, which measures the total variance that can be attributed to group differences [45]. If we confirm this assumption, two levels will be estimated-level one (runners) and level two (countries) (Fig 2). The main idea is to estimate the statistically significant differences between runners from different countries; and to determine the performance variance within-and between-countries, as well as individual and environmental factors associated with running performance.
Latent Class Analysis (LCA) will be used for the secondary aims. LCA is a statistical personcentered procedure, used to cluster subgroups that share specific characteristics [46]. LCA uses categorical variables, and in the present project, we intend to use this procedure to verify how athletes are nested within-and between-countries based on different variables, such as economic aspects, training characteristics, and environmental perception, as well as to find if there is a class with the highest probability to present better performance. The confidence interval will be fixed at 95%.
In addition, a Network analysis will be used to identify the relationship between countrylevel variables and runners' performance. The Network analysis is a set of integrated techniques to detail relations between variables/subjects and to analyse the structures that emerge from these relations [47]. For the present project, this analysis will allow the understanding of the non-linear interactions between variables presented in our conceptual model and to identify variables that are hubs (i.e., "bridges") between micro and macro-levels. In addition, multiple regression models will be performed, considering both addictive and multiplicative effects to predict runners' performance within and between countries.

Broadcasting findings
We will disseminate trial results to all interested (e.g., participants, stakeholders, partners). Results will be submitted for publication in peer-reviewed journals and presented at scientific events (e.g., congress, meetings, and symposiums). Authorship of publications will be based on contributions toward design, data collection, analysis, and manuscript writing. All participants' data will be anonymized (i.e., encrypted) to protect confidentiality. The project and the scientific articles will be developed following the guidelines of Improving the quality of Web surveys: the Checklist for Reporting Results of Internet E-Surveys [48].

Discussion
Moving beyond the idea that athletes are not randomly distributed between different places, the shared values within a population, the national pride, and sports cultural values present important significance for athletes' commitment to training. For the present project, we do not neglect the importance of genetic, physiological, psychological, and dietetic variables that influence runners' performance [49,50]; instead, we are using different lenses to understand this complex phenomenon. In this sense, the project presents weaknesses, threats, strengths, and opportunities that must be highlighted.

Weakness, threats, strengths, and opportunities
The most important weakness is related to the research design. In cross-cultural studies, the risk of noise needs to be considered in different steps of the project, starting with the unit of analysis. The unit of analysis defines the scope of the research design. For the present study, the unit of analysis is the countries. Although we assume that countries are different, differences in cultural, economic, and social aspects also are verified within countries, especially in countries with higher populations and territories. In this sense, future findings generalization needs to be considered carefully.
Methodological issues include data collection (i.e., procedures and instruments), which is expectable to be standardized between countries. For the present study, dissemination of the research will be carried out using the same procedure in all countries. However, the use of a questionnaire for data collection is prone to bias since some inequivalence can be seen between countries. This means that a given question can be understood differently by participants from different countries (and even from the same country) due to different backgrounds and perspectives. To reduce this bias, the development of the questionnaire was based on the theoretical framework, considering the opinions of different researchers, from the different countries that are part of the project. In addition, statistical procedures to test the internal consistency of the instrument will be performed. Is important to refer that the project was not designed to provide representative data from each participating country. Instead, the methodological approach was designed to allow data collection in different countries, using a low-cost strategy. However, we are conscious of the limitations of using this method.
Another important limitation is the "black box problem" since we are unable to understand the mechanism between inputs and outputs. Therefore, we intend to explore the moderator role of variables situated at the meso and micro levels that can be related to training commitment and runners' performance. A further problem is an ecological fallacy, defined as a problem of confounding [51]. This means that individual estimates are based on aggregated data for a group, which implies the consequence of the inferences-for example, cultural values obtained at the country level do not mean that all runners share the same trait/behavior. Last, but not least, we must be careful in investigating different countries. Culture, values, beliefs, and the countries' pro-community atmosphere are difficult to measure. However, when comparing countries we are also adopting "hidden characteristics" as control variables. In addition, as we are using secondary data to refer meso and macro-level, the potential to explore variables is associated with the availability and updated information.
Threats of the project include the use of an online questionnaire for data collection. Although widely used, this strategy can present some barriers to access, and reduce the return rate, limiting the sample size and generalization of the findings. In addition, a body of evidence is available about the bias associated with self-reported information. These biases include overand under-reporting information, social expectations, recall, and confirmation bias [52]. However, the bias can be lower for training characteristics, since about eight out of 10 runners [53] use devices to record and monitor training. Even though using a cross-sectional approach, we are not considering the dynamic characteristics of the performance.
Besides the weaknesses and threats, our strengths include the use of information on a hierarchical level and the exploration of possible relationships between these different levels with athletes' performance. Results obtained from the project can advance the understanding of environmental factors associated with performance. Comparing countries may provide insights into understanding the differences in international sports success and engagement. The last important aspect may border on hyperbole, but the positive impact of sports on human development must not be underestimated.